Weighted Evidence Accumulation Clustering Using Subsampling
نویسندگان
چکیده
We introduce an approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process. By applying a clustering algorithm to this co-association matrix we obtain the final data partition. In this paper we propose a clustering ensemble combination approach that uses subsampling and that weights differently the partitions (WEACS). We use two ways of weighting each partition: SWEACS, using a single validation index, and JWEACS, using a committee of indices. We compare combination results with the EAC technique and the HGPA, MCLA and CSPA methods by Strehl and Gosh using subsampling, and conclude that the WEACS approaches generally obtain better results. As a complementary step to the WEACS approach, we combine all the final data partitions produced by the different variations of the method and use the Ward Link algorithm to obtain the final data partition.
منابع مشابه
Definition of MV Load Diagrams via Weighted Evidence Accumulation Clustering using Subsampling
A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering a...
متن کاملBilateral Weighted Fuzzy C-Means Clustering
Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...
متن کاملOn the Consistency of k-means++ algorithm
We prove in this paper that the expected value of the objective function of the k-means++ algorithm for samples converges to population expected value. As k-means++, for samples, provides with constant factor approximation for k-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is...
متن کاملConsensusClusterPlus (Tutorial)
Consensus Clustering [1] is a method that provides quantitative evidence for determining the number and membership of possible clusters within a dataset, such as microarray gene expression. This method has gained popularity in cancer genomics, where new molecular subclasses of disease have been discovered [3, 4]. The Consensus Clustering method involves subsampling from a set of items, such as ...
متن کاملCoresets for Nonparametric Estimation - the Case of DP-Means
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore the use of coresets – a data summarization technique originating from computational geometry – for this task. Coresets are weighted subsets of the data such that models trained on these coresets are provably competitive with models trained on the full dataset. Coresets sublinear in the dataset si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006